Chinese-English Backward Transliteration Assisted with Mining Monolingual Web Pages
نویسندگان
چکیده
In this paper, we present a novel backward transliteration approach which can further assist the existing statistical model by mining monolingual web resources. Firstly, we employ the syllable-based search to revise the transliteration candidates from the statistical model. By mapping all of them into existing words, we can filter or correct some pseudo candidates and improve the overall recall. Secondly, an AdaBoost model is used to rerank the revised candidates based on the information extracted from monolingual web pages. To get a better precision during the reranking process, a variety of web-based information is exploited to adjust the ranking score, so that some candidates which are less possible to be transliteration names will be assigned with lower ranks. The experimental results show that the proposed framework can significantly outperform the baseline transliteration system in both precision and recall.
منابع مشابه
Semi-Supervised Lexicon Mining from Parenthetical Expressions in Monolingual Web Pages
This paper presents a semi-supervised learning framework for mining Chinese-English lexicons from large amount of Chinese Web pages. The issue is motivated by the observation that many Chinese neologisms are accompanied by their English translations in the form of parenthesis. We classify parenthetical translations into bilingual abbreviations, transliterations, and translations. A frequency-ba...
متن کاملHypothesis Selection in Machine Transliteration: A Web Mining Approach
We propose a new method of selecting hypotheses for machine transliteration. We generate a set of Chinese, Japanese, and Korean transliteration hypotheses for a given English word. We then use the set of transliteration hypotheses as a guide to finding relevant Web pages and mining contextual information for the transliteration hypotheses from the Web page. Finally, we use the mined information...
متن کاملA System to Mine Large-Scale Bilingual Dictionaries from Monolingual Web Pages
This paper describes a system that automatically mines EnglishChinese translation pairs from large amount of monolingual Chinese web pages. Our approach is motivated by the observation that many Chinese terms (e.g., named entities that are not stored in a conventional dictionary) are accompanied by their English translations in the Chinese web pages. In our approach, candidate translations are ...
متن کاملMining Name Translations from Entity Graph Mapping
This paper studies the problem of mining entity translation, specifically, mining English and Chinese name pairs. Existing efforts can be categorized into (a) a transliterationbased approach leveraging phonetic similarity and (b) a corpus-based approach exploiting bilingual co-occurrences, each of which suffers from inaccuracy and scarcity respectively. In clear contrast, we use unleveraged res...
متن کاملChinese-to-English Backward Machine Transliteration
It is challenging to transliterate named entities across languages. It is even more challenging to backward transliterate the transliterated term into its original form. This paper addresses the problem of backward translating person name from Chinese to its English counterpart. We propose a statistical backward transliteration method. Our method uses English sub-syllable and Chinese syllable a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2008